Language as an Evolving Word Web 
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Human language can be described as a complex network of linked words. In such a treatment, 
each distinct word in language is a vertex of this web, and neighboring words in sentences are 
connected by edges. It was recently found |l] that the distribution of the numbers of connections 
of words in such a network is of a peculiar form which includes two pronounced power-law regions. 
Here we treat language as a self-organizing network of interacting words. In the framework of this 
concept, we completely describe the observed Word Web structure without fitting. 



How language evolves is a major challenge for lin- 
guistics and evolutionary biology and an intriguing 
problem for other sciences |5|-|l0||. The recent explosion 
of interest in networks jO-n5[ , including the World Wide 
Web and Internet |l^l9[|, biological networks , social 
[21 1 and ecological webs |2^, networks of collaborations 
[23 1 etc., had the immediate consequence — the treat- 
ment of human language as a complex network of distinct 
words 0]. 

This Word Web is arranged in the following way. The 
vertices of the web are the distinct words of language, 
and the undirected edges are connections between inter- 
acting words. It is not so easy to define the notion of 
word interaction in a unique way. Nevertheless, different 
reasonable definitions provide very similar structures of 
the Word Web. For instance, one can connect the nearest 
neighbors in sentences. Without going into detail, this 
means that the edge between two distinct words of lan- 
guage exists if these words are the nearest neighbors in 
at least one sentence in the bank of language. In such a 
definition, multiple links are absent. One also may con- 
nect the second nearest neighbors and account for other 
types of the correlations between words. 

Recently it was found that this network has a complex 
architecture |Q] which dramatically differs from classical 
random graphs extensively studied in the mathematical 
graph theory. In Ref. ^ the basic informative character- 
istic of the Word Web, the distribution of the numbers 
of connections of words, has been obtained empirically. 
In graph theory, the number of connections of a vertex 
is called its degree. The observed degree distribution of 
the Word Web has a long tail — unlike the Poisson de- 
gree distribution for the classical random graphs. This 
indicates that the Word Web belongs to the same class 
as the World Wide Web and Internet 012 [. 

Moreover, the degree distribution obtained in Ref. ||] 
has a complex form. It consists of two power-law parts 



with different exponents. This hampers any treatment 
but, on the other hand, makes possible to find an ex- 
planation of the basic structure of the word web in the 
framework of a general concept. Indeed, if one proposes 
a model which, without fitting, describes the empirical 
degree distribution and reproduces the values of all the 
characteristic scales, the announced aim will be achieved 
(it is hardly possible to describe such a complex form 
perfectly by coincidence). Here we present the solution 
of this problem. 

Human language is certainly an evolving system. Its 
present structure is determined by its past evolution. 
This system is so complex that it can not be controlled 
but rather organizes itself while growing. We treat lan- 
guage as a growing network of interacting words. At its 
birth, a new word already interacts with several old ones. 
New interactions between old words emerge from time to 
time, and new edges arise. 

How do words find their collaborators in language? 
Here we use the idea of preferential linking (preferential 
attachment of new edges to vertices with higher numbers 
of connections) [|l3|. This fruitful idea is a particular 
realization of the general concept of Simon The 
simplest linear form of the preferential linking provides 
the power-law degree distributions for nets in which the 
average number of connections per vertex (the average 
degree) does not change during the growth |jl^,^,^. If 
the total number of connections increases faster than the 
number of vertices, and the average degree grows, the ex- 
ponent of the degree distribution takes a different value 
[ p6t . For the explanation of the resulting structure of the 
Word Web, we combine these two processes of the edge 
emergence. 

We use the following rules of the network growth (see 
Fig. 1). At each time step, a new vertex (word) is added 
to the network, and the total number of vertices, t, plays 
the role of time. At its birth, the new word connects to 
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several old ones. We do not know the original number of 
connections. We only know that it is of the order of 1. 
It would be unfair to play with an unknown parameter 
to fit the experimental data, so we set this number to 1 
prf . We use the simplest natural version of the prefer- 
ential linking, so a new word is connected to some old 
one i with the probability proportional to its degree fci, 
like in the Barabasi- Albert's model |l3|. In addition, at 
each increment of time, ct new edges emerge between old 
words, where c is a constant coefficient that characterises 
a particular network. The linear dependence appears if 
each vertex makes new connections with a constant rate, 
and we choose it as the most simple and natural. These 
new edges emerge between old words i and j with the 
probability proportional to the product of their degrees 

HQ. 

Two slightly different methods were used in Ref. @] 
to construct the Word Web. The two resulting webs ob- 
tained after processing 3/4 million words of the British 
National Corpus (a collection of text samples of both spo- 
ken and written modern British English) have nearly the 
same degree distributions, and each one contains about 
470,000 vertices. The average number of connections of 
a word (the average degree) is fc « 72. These are the 
only parameters of the Word Web we know and can use 
in the model. 

This stochastic model can be solved exactly but here, 
for a simple presentation, we use the continuous approx- 
imation. Such an approach was proved to describe quite 
well the degree distributions of networks growing under 
the mechanism of preferential linking |p^ , p6|j29| . In our 
case, it provides the nonstationary degree distribution 
P(fe,t) very close to the exact one everywhere except of 
the narrow region k < 10. One should emphasize that 
the continuous approach yields the exact values of the 
exponents of the distribution. 

In the continuous approximation, the degrees of the 
vertices born at time s and observed at time t are sub- 
stituted by their average value k{s,t). For the large net- 
work, the evolution of fc(s, t) is described by the simple 
equation 
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with the obvious boundary condition k{t,t) = 1. The 
nature of this equation can be easily understood. The 
ratio on the right hand side is a direct consequence of the 
preferential attachement. At each time step, 1 -I- 2ct ends 
of new edges are distributed preferentially. Indeed, one 
such an end belongs to the edge coming from a new word, 
and the others are the ends of the ct new edges emerging 
between old words. Here, we have presented heuristic 
arguments but Eq. |l| can be derived more strictly |^ . 

One sees that the total degree of the network is 
J* duk{u,t) = 2t + ct^, so its average degree is equal 
to k{t) = 2 + ct. The present value of the average degree 



of the Word Web is close to 72, hence K ct « 70. The 
solution of Eq. |^ is of a singular form 
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which indicates the presence of two distinct regimes in 
this problem. From Eq. we immediately obtain the 
nonstationary degree distribution 
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where s = s{k, t) is the solution of Eq. |[ 

One sees from Eqs. |^ and ^ that this nonstationary 
distribution has two regions with different behaviors sep- 
arated by the crossover point Across ~ ■\/ci(2-f ct)'^/^. The 
crossover moves in the direction of large degrees while the 
network grows. Below this point, the degree distribution 
is stationary, P{k) = ^k~^^^ (we use the fact that in the 
Word Web S> 1). Above the crossover point, we ob- 
tain the behavior P{k,t) = j{ct)^k~^, so that the degree 
distribution is nonstationary in this region. Thus, the 
model provides two distinct values for the degree distri- 
bution exponent, 3/2 and 3. 

The degree distribution has one more important char- 
acteristic point, the cutoff produced by the size effect. Its 
position kcut is easily estimated from the condition that 
one vertex in the network is of degree exc eed ing kcut , that 
is t J^^^ dkP{k) - 1 and thus k^ut ~ ^/tj?>{ctf/'^. Here 
we do not present the complete exact result which can 
be obtained using the master equation approach. The 
infinite limit of the exact degree distribution takes the 
simple form P{k,t — > oo) = ^5(^,3/2) where B{,) is 
the beta-function. Minor deviations from the continuous 
approximation are visible only for k < 10. 

In Fig. 2, we plot the degree distribution of the model 
(the solid line). To obtain the theoretical curve, we used 
Eqs. H and y with the known parameters of the Word 
Web. The deviations from the continuous approxima- 
tion are accounted for in the small k region, A: < 10. One 
sees that the agreement with the empirical data [|l| is ex- 
cellent. Note that we do not use any fitting. For a better 
comparison, in Fig. ||, the theoretical curve is displaced 
upward (we have to exclude two experimental points with 
the smallest k since these points are dependent on the 
method of the construction of the Word Web, and any 
comparison in this region is meaningless in principle). 

From the relations obtained above, we find the char- 
acteristic values for the crossover and cutoff, kcross ~ 
5.1 X 10^, that is, log^o kcross ~ 3.7, and log^o kcut ~ 5.2. 
From Fig. |^, one sees that these values coincide with the 
experimental ones. As far as we know, it is the first time 
that such complex empirically obtained data for networks 
are described without fitting. We should emphasize that 
the extent of agreement is truly surprising. The minimal 
model does not account for numerous, at first sight, im- 
portant factors, e.g., the death of words, the variations 
of words during the evolution of language, etc. p3]. The 
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agreement is convincing since it is approached over the 
whole range of values of k, that is, over five decades. In 
fact, the Word Web turns out to be very convenient in 
this respect since the total number of edges in it is ex- 
tremely high, about 3.4 x 10^ edges, and the value of the 
cutoff degree is large. 

Note that few words are in the region above the 
crossover point kcross ~ 5.1 x 10'^. With the growth of 
language, kcross increases rapidly but, as it follows from 
our relations, the total number of words of degree greater 
than kcross does not change. It is a constant of the order 
of l/(8c) « t/{8k) ~ 10^, that is of the order of the size 
of a small set of words forming the kernel lexicon of the 
British English which was estimated as 5, 000 words ||3^ ] 
and is the most important core part of language. There- 
fore, our concept suggests that the number of words in 
this part of language does not depend essentially of its 
size. Formally speaking, this is determined by the value 
of the average rate c with which words find new partners 
in language. 

There exist many obvious ways to improve the mini- 
mal model used above. Nevertheless, at present, such at- 
tempts seem rather meaningless since, as we have noted, 
it is hard to define rigorously the procedure of the Word 
Web construction, and the experimental data do not al- 
low us to make a better comparison. 

We have proposed a simple stochastic theory of evolu- 
tion of human language based on the treatment of lan- 
guage as an evolving network of interacting words. The 
structure of language is the result of the self-organization 
of the Word Web during its growth. The key result is 
the distribution of numbers of connections of words. We 
have found that the self-organization produces the most 
connected small kernel lexicon of language, size of which 
does not change essentially along the language evolution. 
The degree distribution of words in this core of language 
crucially differs from the degree distribution of the rest. 
We have shown that the basic characteristic of the Word 
Web structure, namely the degree distribution, does not 
depend on the rules of language but is determined by the 
general principles of the evolutionary dynamics of the 
Word Web. We would like to note that the successful 
description is important since the recent progress in the 
understanding of numerous stochastic multiplicative pro- 
cesses in Nature is based on the famous Simon's model 
1^,^ which was originally applied to the description of 
the structure of human language. 
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FIG. 1. Scheme of the Word Web growth. At each time 
step a new word is appear, so t is the total number of words. 
It connects to some preferentially chosen old word. Simulta- 
neously, ct new edges emerge between pairs of preferentially 
chosen old words. All the edges are undirected. We use the 
simplest kind of the preferential attachment when a node is 
chosen with the probability proportional to the number of its 
connections. 
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FIG. 2. The distribution of the numbers of connections 
(degrees) of words in the Word Web in a log-log scale. The 
solid line is the result of our calculation using the parameters 
of the Word Web, the size t « 470, 000 and the average num- 
ber of connections k{t) ~ 72. Empty and filled circles show 
the distributions of the numbers of connections obtained in 
Ref. jl| for the two different methods of construction of the 
Word Web. In the region k < 10, where the deviations of 
the continuous approximation from the exact solution of the 
model are noticeable, we present the exact solution. The ar- 
rows indicate the theoretically obtained point of crossover, 
kcross, between the regions with the exponents 3/2 and 3 and 
the cutoff kcut of the power-law dependence due to the size 
effect. For a better comparison, the theoretical curve is dis- 
placed upward (note that the comparison is impossible in the 
region of the smallest k where the experimentally obtained 
distribution essentially depends on the definition of the Word 
Web). 



4 



